NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition

https://doi.org/10.21437/Interspeech.2022-816

Ni, Junrui; Wang, Liming; Gao, Heting; Qian, Kaizhi; Zhang, Yang; Chang, Shiyu; Hasegawa-Johnson, Mark (September 2022, Proc. Interspeech 2022)

An unsupervised text-to-speech synthesis (TTS) system learns to generate speech waveforms corresponding to any written sentence in a language by observing: 1) a collection of untranscribed speech waveforms in that language; 2) a collection of texts written in that language without access to any transcribed speech. Developing such a system can significantly improve the availability of speech technology to languages without a large amount of parallel speech and text data. This paper proposes an unsupervised TTS system based on an alignment module that outputs pseudo-text and another synthesis module that uses pseudo-text for training and real text for inference. Our unsupervised system can achieve comparable performance to the supervised system in seven languages with about 10-20 hours of speech each. A careful study on the effect of text units and vocoders has also been conducted to better understand what factors may affect unsupervised TTS performance. The samples generated by our models can be found at https://cactuswiththoughts.github.io/UnsupTTS-Demo, and our code can be found at https://github.com/lwang114/UnsupTTS.
more » « less
Full Text Available
Self-supervised Semantic-driven Phoneme Discovery for Zero-resource Speech Recognition

https://doi.org/10.18653/v1/2022.acl-long.553

Wang, Liming; Feng, Siyuan; Hasegawa-Johnson, Mark; Yoo, Chang (January 2022, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers))
Muresan, Smaranda; Nakov, Preslav; Villavicencio, Aline (Ed.)
Phonemes are defined by their relationship to words: changing a phoneme changes the word. Learning a phoneme inventory with little supervision has been a longstanding challenge with important applications to under-resourced speech technology. In this paper, we bridge the gap between the linguistic and statistical definition of phonemes and propose a novel neural discrete representation learning model for self-supervised learning of phoneme inventory with raw speech and word labels. Under mild assumptions, we prove that the phoneme inventory learned by our approach converges to the true one with an exponentially low error rate. Moreover, in experiments on TIMIT and Mboshi benchmarks, our approach consistently learns a better phoneme-level representation and achieves a lower error rate in a zero-resource phoneme recognition task than previous state-of-the-art self-supervised representation learning algorithms.
more » « less
Full Text Available
Align or attend? Toward More Efficient and Accurate Spoken Word Discovery Using Speech-to-Image Retrieval

https://doi.org/10.1109/ICASSP39728.2021.9414418

Wang, Liming; Wang, Xinsheng; Hasegawa-Johnson, Mark; Scharenborg, Odette; Dehak, Najim (June 2021, ICASSP)
null (Ed.)
Multimodal word discovery (MWD) is often treated as a byproduct of the speech-to-image retrieval problem. However, our theoretical analysis shows that some kind of alignment/attention mechanism is crucial for a MWD system to learn meaningful word-level representation. We verify our theory by conducting retrieval and word discovery experiments on MSCOCO and Flickr8k, and empirically demonstrate that both neural MT with self-attention and statistical MT achieve word discovery scores that are superior to those of a state-of-the-art neural retrieval system, outperforming it by 2% and5% alignment F1 scores respectively.
more » « less
Full Text Available
A DNN-HMM-DNN Hybrid Model for Discovering Word-Like Units from Spoken Captions and Image Regions

https://doi.org/10.21437/Interspeech.2020-1148

Wang, Liming; Hasegawa-Johnson, Mark (October 2020, Interspeech)
null (Ed.)
Discovering word-like units without textual transcriptions is an important step in low-resource speech technology. In this work,we demonstrate a model inspired by statistical machine translation and hidden Markov model/deep neural network (HMM-DNN) hybrid systems. Our learning algorithm is capable of discovering the visual and acoustic correlates of distinct words in an unknown language by simultaneously learning the map-ping from image regions to concepts (the first DNN), the map-ping from acoustic feature vectors to phones (the second DNN),and the optimum alignment between the two (the HMM). In the simulated low-resource setting using MSCOCO and Speech-COCO datasets, our model achieves 62.4 % alignment accuracy and outperforms the audio-only segmental embedded GMM approach on standard word discovery evaluation metrics.
more » « less
Full Text Available
Structure of polymer-capped gold nanorods binding to model phospholipid monolayers

https://doi.org/10.1088/2515-7639/abedcd

Quan, Peiyu; Bu, Wei; Wang, Liming; Chen, Chunying; Wu, Xiaochun; Heffern, Charlie; Lee, Ka Yee; Meron, Mati; Lin, Binhua (April 2021, Journal of Physics: Materials)
null (Ed.)
Full Text Available
Multimodal Word Discovery and Retrieval With Spoken Descriptions and Visual Concepts

https://doi.org/10.1109/TASLP.2020.2996082

Wang, Liming; Hasegawa-Johnson, Mark (January 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing)
null (Ed.)
Full Text Available
Model-Free Temporal Difference Learning for Non-Zero-Sum Games

https://doi.org/10.1109/IJCNN.2019.8851866

Wang, Liming; Yang, Yongliang; Ding, Dawei; Yin, Yixin; Guo, Zhishan; Wunsch, Donald C. (July 2019, International Joint Conference on Neural Networks (IJCNN))

Full Text Available
Stability of Ligands on Nanoparticles Regulating the Integrity of Biological Membranes at the Nano–Lipid Interface

https://doi.org/10.1021/acsnano.9b00114

Wang, Liming; Quan, Peiyu; Chen, Serena H.; Bu, Wei; Li, Yu-Feng; Wu, Xiaochun; Wu, Junguang; Zhang, Leili; Zhao, Yuliang; Jiang, Xiaoming; et al (July 2019, ACS Nano)

Full Text Available
Template-free fabrication of vertically-aligned polymer nanowire array on the flat-end tip for quantifying the single living cancer cells and nanosurface interaction

https://doi.org/10.1016/j.mfglet.2018.03.004

Wang, Biran; Wang, Liming; Li, Xi; Liu, Yuchen; Zhang, Zimeng; Hedrick, Erik; Safe, Stephen; Qiu, Jingjing; Lu, Guohui; Wang, Shiren (April 2018, Manufacturing Letters)

Full Text Available
Polymer composites-based thermoelectric materials and devices

https://doi.org/10.1016/j.compositesb.2017.04.019

Wang, Liming; Liu, Yuchen; Zhang, Zimeng; Wang, Biran; Qiu, Jingjing; Hui, David; Wang, Shiren (August 2017, Composites Part B: Engineering)

Full Text Available

« Prev Next »

Search for: All records